active learning
"Machine learning force fields (MLFF), compared to traditional methods, can predict material properties and reaction mechanisms faster and more accurately. The current state-of-the-art deep learning-based molecular dynamics can simulate systems with billions of atoms. However, due to the interpolation nature of machine learning methods, MLFFs struggle to make accurate predictions in the phase space outside the training set. Since training data is usually generated using expensive first-principles calculations, it is challenging to obtain a large amount of ab initio data that is both representative and independent of extensive calculations. Improving the extrapolation capability of MLFF models without relying on a large amount of ab initio data is crucial. PWact
(Active learning based on PWMAT Machine Learning Force Field) is an open-source automated active learning platform based on PWMLFF, designed for efficient data sampling."
AL-PWMLFF
The AL-PWMLFF platform consists of two main components: the main task and the task scheduler, as shown in the architecture diagram.
The main task includes two modules, preparing pre-training data (init_bulk) and active learning (sampling). It is responsible for generating computational tasks and collecting results during the preparation of pre-training data and the active learning process. The task scheduler receives task scheduling requests and assigns tasks to the corresponding computing nodes based on the resource utilization and task resource requirements. After the tasks are executed, the task scheduler collects the execution results from the computing nodes and returns them to the main task program.
pre-training data preparation module
Includes four sub-modules: relaxation (supporting PWMAT, VASP, CP2K, and DFTB), supercell generation, lattice scaling, lattice perturbation, and running MD (supporting PWMAT, VASP, CP2K, and DFTB). It also supports combinations of these modules.
active learning module
The active learning module consists of three sub-modules: training, configuration exploration, and annotation (supporting PWMAT, VASP, CP2K, and DFTB). First, the training module performs model training. Then, the trained model is passed to the exploration module, which uses the force field model for molecular dynamics simulations. After the simulation, the molecular motion trajectory is passed to the query module for uncertainty measurement. Once the query is completed, the annotated configuration points are sent to the annotation module. Finally, the annotation module performs self-consistent calculations to obtain energy and forces, which are used as labels along with the corresponding configurations in the annotated database. This process is repeated until convergence.
-
For model training, PWMLFF supports DP model, DP model with compress, DP model with type embedding and NEP(NEP4) model.
-
For uncertainty measurement, common methods based on multiple-model committee queries are provided, as well as our latest design, the single-model Kalman Prediction Uncertainty (KPU) based on Kalman filtering. This method can reduce the computational cost of model training to 1/N, where N is the number of models in the committee query, while achieving accuracy close to the committee query. Users are welcome to try this method. For the KPU method, it is only applicable to the DP model.
-
For annotation, PWMAT or VASP is supported.
The pre-training data preparation module
It includes four sub-modules: relaxation (supporting PWMAT or VASP), supercell generation, lattice scaling, lattice perturbation, and running MD (supporting DFTB, PWMAT, or VASP). It also supports combinations of these modules.
Dependencies
-
AL-PWMLFF job scheduling uses the SLURM cluster management and job scheduling system. SLURM must be installed on your computing cluster.
-
DFT calculations in AL-PWMLFF support PWmat, VASP, CP2K and DFTB. We have integrated DFTB in PWmat. You can find detailed usage instructions in the
DFTB_DETAIL section
of thePWmat Manual
. -
AL-PWMLFF model training is based on
PWMLFF
. Refer to thePWMLFF documentation
for installation instructions (Download address for PWmat version integrated with DFTB
). -
AL-PWMLFF Lammps molecular dynamics simulation is based on Lammps_for_pwmlff. Refer to the
Lammps_for_pwmlff documentation
for installation instructions.
Installation Guide
PWact supports two installation methods: pip command installation and source code installation.
1. Pip Command Installation
Source Code Download
git clone https://github.com/LonxunQuantum/PWact.git
or
git clone https://gitee.com/pfsuo/pwact.git
The Gitee repository may not be updated as promptly as GitHub, so it is recommended to download from GitHub.
After downloading the source code, navigate to the root directory (at the same level as setup.py
) and run the following command:
pip install .
# Or use the developer option, which installs without copying files. It reads directly from the source files, meaning any changes to the source code will take effect immediately. This is useful for users who need to modify the source code themselves.
# pip install -e .
PWact is developed in Python and supports Python 3.9 and above. It is recommended to use the Python runtime environment of PWMLFF directly.
If you need to create a separate virtual environment for PWact, you only need to install the following dependencies (matching your Python version, supporting Python 3.9 and above).
pip install numpy pandas tqdm pwdata
Command List
PWact includes the following commands, starting with the command pwact
.
1. Output a list of available commands
pwact [ -h / --help / help ]
You can also use this command to check if PWact is installed successfully.
2. Output the parameter list for cmd_name
pwact cmd_name -h
3. Prepare initial training set
pwact init_bulk param.json resource.json
4. Active Learning
pwact run param.json resource.json
For the above two commands, the names of the json files can be changed by the user, but the order of input for param.json
and resouce.json
must remain the same.
5. Utility Commands
Convert MOVEMENT or OUTCAR to PWdata format
pwact to_pwdata
Search for labeled datasets in the active learning directory
pwact gather_pwdata
End the ongoing init_fulk tasks, such as relaxation and AIMD tasks.
pwact kill init_bulk
End running tasks that are currently in progress, including training, exploration (MD), or tagging tasks
pwact kill run
The kill command function above can also be replaced by manual operation. You need to first end the executing main process, that is, the window that executes pwact init_fulk or pwact run; The second step requires you to manually end the ongoing Slurm task.
Considering that manual operation may accidentally terminate your other processes, it is recommended that you use the command to terminate.
After using the command to end a process, it is recommended that you check the command output information and use the sludge command to see if there are any unfinished processes.
Input Files
AL-PWMLFF requires two input files, param.json
and resource.json
, for initial dataset preparation or active learning. AL-PWMLFF is not sensitive to the case input of keys in two JSON files.
param.json
Initial Training Set Preparation - init_param.json
Configurations (VASP, PWMAT format) for relaxation, supercell, scaling, perturbation, and AIMD (DFTB, PWMAT, VASP) settings.
Active Learning - run_param.jso
Training settings (network structure, optimizer), exploration settings (LAMMPS settings, sampling strategies), and labeling settings (VASP/PWMAT self-consistent calculation settings).
resource.json
Settings for computational cluster resources, including computing nodes, CPU, GPU resources for training, molecular dynamics (MD), DFT calculations (SCF, Relax, AIMD), and corresponding software (LAMMPS, VASP, PWMAT, PWMLFF).